MENet: A Mitscherlich function based ensemble of CNN models to classify lung cancer using CT scans

Lung cancer is one of the leading causes of cancer-related deaths worldwide. To reduce the mortality rate, early detection and proper treatment should be ensured. Computer-aided diagnosis methods analyze different modalities of medical images to increase diagnostic precision. In this paper, we propose an ensemble model, called the Mitscherlich function-based Ensemble Network (MENet), which combines the prediction probabilities obtained from three deep learning models, namely Xception, InceptionResNetV2, and MobileNetV2, to improve the accuracy of a lung cancer prediction model. The ensemble approach is based on the Mitscherlich function, which produces a fuzzy rank to combine the outputs of the said base classifiers. The proposed method is trained and tested on the two publicly available lung cancer datasets, namely Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) and LIDC-IDRI, both of these are computed tomography (CT) scan datasets. The obtained results in terms of some standard metrics show that the proposed method performs better than state-of-the-art methods. The codes for the proposed work are available at https://github.com/SuryaMajumder/MENet.


Introduction
Lung cancer is cancer that starts in the lungs, usually in the cells that line the airways.It is one of the main reasons for cancer-related fatalities globally.According to the World Health Organization (WHO), there were 2.21 million new cases of lung cancer in 2020 [1], making it the second most common type of cancer in terms of new cases.Smoking, exposure to certain chemicals and pollutants, and a family history of the disease are all risk factors for lung cancer.Cancer significantly affects a person's quality of life, as well as their physical and mental health.To reduce mortality and improve patient's quality of life, awareness of cancer prevention, early detection and effective treatment options must be increased.The use of computer-aided detection (CAD) methods to help medical workers in the early detection of diseases like lung cancer has grown in popularity in recent years.These methods analyze medical images using machine learning algorithms.CAD systems can increase diagnostic precision and decrease the likelihood of human mistakes by automating the diagnosis process.Recent years have seen the publication of several study papers discussing the application of machine learning methods such as [2][3][4] for early detection of lung cancer.These techniques generally involve using a collection of the particular modalities of medical images to teach a machine learning algorithm and derive useful features that can be used for identifying the presence or absence of lung cancer, or the severity of the same.Over the last few decades, Convolutional Neural Networks (CNNs) [5] have been incredibly effective in overcoming several difficulties related to image classification and pattern recognition.Unlike conventional machine learning algorithms, CNNs do not need any feature-engineering methods, rather they can automatically extract the relevant features from the input data.CNNs, a subclass of deep neural networks, have transformed the field of computer vision by getting cutting-edge results on a variety of visual recognition tasks, including image segmentation, object detection, and classification.CNNs are a powerful tool used for different applications, but they are not a silver bullet and may require support from other methods to improve performance [6,7].
Also, training deep neural networks from scratch requires large amounts of data and computational resources, and can take a long time to converge.In some cases, it may be impractical or impossible to form a deep network from scratch due to these limitations.Transfer learning provides a solution [8,9] that allows us to use pre-trained models that have been trained on large datasets and learned useful feature representations.By using pre-trained models or transferring knowledge from one task to another, transfer learning can boost the performance of a single model.By lowering the variance and enhancing the models' capacity for generalization, ensembling multiple models can further enhance performance.Ensemble learning can assist in building more reliable and precise models that are better able to manage real-world data by combining the decision-making abilities of varied models.The use of component classifiers learned from various groups to create a composite categorization system was suggested by [10] in order to improve the performance of identification systems.The popularity of ensemble learning has grown steadily in recent years [11][12][13][14][15][16][17].Nowadays, ensemble learning is considered a key technique in the machine learning toolkit and is widely used in industry and the academic world.There are several approaches like bagging [18], boosting [19], and staking [20] for classification tasks in ensemble learning based on the specific application and available data.
Considering these details, in this study, we have assumed that ensemble learning can be an effective way to detect lung cancer, which involves identifying different types of lung abnormalities from medical computed tomography (CT) scans.This approach may be helpful to tackle few challenges, such as image quality variations, noise, distortion, low-density contrast, etc., as shown in • We propose an ensemble model, called MENet, for lung cancer classification using CT scans.
• We also propose a fuzzy ranking system based on the Mitscherlich function to rank and combine the outputs of different base classifiers for forming an ensemble-based prediction model.
• Our proposed method is trained and tested on two publicly available lung CT scan datasets, namely IQ-OTHNCCD and LIDC-IDRI.
• MENet outperforms the existing results in lung cancer prediction with an accuracy of 99.54% and 95.75% on IQ-OTHNCCD and LIDC-IDRI datasets, respectively.

Literature review
The paper's literature review has two distinct segments.The first segment presents a comprehensive summary of previous studies conducted on lung cancer classification.The second segment concentrates on examining ensemble techniques, specifically in the realm of medical image processing.

A review of lung cancer classification
In the study, [21] presented an approach for the classification of lung nodules and non-nodules in CT images, which involved the utilization of texture features.They used three strategies to extract texture measurements: rose diagrams (RD), artificial crawlers (ACs), and a hybrid model that incorporates ACs & RD.The authors employed a support vector machine (SVM) classifier with a radial basis kernel to differentiate between nodules and non-nodules in the candidate categorization process.The study used the Lung Image Database Consortium (LID-C-IDRI) image database and achieved a mean specificity of 94.78%, mean sensitivity of 91.86%, and mean accuracy of 94.30%.However, the main limitation of this work is that the comparison with previous methods is only approximate due to differences in databases and test scenarios.The only precise comparison is with [22], and the proposed methodology achieved a lower specificity score, indicating its lesser ability to classify non-nodule cases accurately.Marcin et al. [23] introduced a novel approach to classify lung carcinomas.This method involves the localization and extraction of lung nodules by computing the local variance of pixels, detecting the maxima, and utilizing a probabilistic neural network as a classifier.The simplicity of the proposed method enables it to detect low-contrast nodules and reduce computational workload while maintaining performance.To explain the nodule appearance without ignoring spatial information, a 7th Order Markov Gibbs Random Field and a Local Binary Pattern are devised by Shaffie et al. [24].Netto et al. [25] proposed a methodology for analyzing lung lesions using temporal evaluation, which can aid in the diagnosis of indeterminate lesions during treatment.The modified quality threshold clustering technique was employed to assign each voxel of the lesion to a cluster, and the alteration in the lesion was evaluated by analyzing the movement of voxels to other clusters over time.To differentiate between benign and malignant lesions, statistical features were extracted.The authors employed two databases of pulmonary lesions, one for malignant lesions under treatment and the other for undetermined cases, to develop their proposed methodology.By analyzing the density changes of lesions over time, the researchers achieved an accuracy of 98.41% in accurately identifying lung lesions.Xie et al. [26] proposed a distinctive approach for lung nodule classification, named Fuse-TSD.This method combines information on texture, shape, and deep model-based learning at the decision level.The approach utilizes a texture descriptor derived from a gray-level co-occurrence matrix (GLCM), a Fourier shape descriptor that captures the heterogeneity of nodules, and a deep convolutional neural network (DCNN) that learns the feature representation of nodules automatically, slice-by-slice.The obtained characteristics are trained with an AdaBoosted back propagation neural network (BPNN), and the judgments of three classifiers are combined to distinguish benign from malignant nodules.
An automated diagnosis classification method for CT lung images was introduced by [27].The method employs an Optimal Deep Neural Network and Linear Discriminate Analysis to extract deep features and reduce dimensionality.To optimize the Optimal Deep Neural Network for classifying lung nodules as benign or malignant, the authors used a modified gravitational search algorithm.The automated approach not only enhances the classification accuracy but also reduces the time required for manual labeling and prevents human errors in recognizing normal and abnormal lung images.Shafi et al. [28] proposes deep learning supported SVM model for CT images and gain accuracy of 94%.A lightweight, multi-view sampling-based multi-section CNN architecture was introduced by [29], which efficiently captures the structural information of nodules from CT scans for lung cancer diagnosis.The method utilizes a view pooling layer to aggregate information from multiple cross-sections of the nodule.It encodes its volumetric information into a concise representation that is then utilized for nodule classification.Masood et al. [30] developed a decision support system using CT scans for lung nodule detection.Their approach utilized a 3D deep CNN, multi-region proposal network, and median intensity projection to automatically identify regions of interest.The performance of the system was evaluated on LUNA16, ANODE09, and LIDC-IDRI datasets.The main shortcoming of this work is that the accuracy of detecting micro-nodules with a diameter of less than 3 mm is relatively low.
Lin et al. [31] put forth a framework for classifying lung cancer using Generative Adversarial Networks (GANs) and discriminator networks.They utilized a deconvolution layer, leaky Rectified Linear Unit (ReLU) activation function, and batch normalization to obtain a 64×64×3 image.The main objective of their research was to overcome the challenge of sparse image data and generate precise predictions.Zhao et al. [32] proposed a forward and backward GAN with multi-scale VGG16.Yuan et al. [33] proposed a CAD method to enhance the detection of pulmonary nodules on CT scans.Their proposed method utilized a 3D Residual U-Net model along with a multi-branch classification network to achieve a high detection sensitivity of 94.0% and a competition performance metric (CPM) score of 0.959 on the LUNA 2016 dataset through multi-task learning.Bhatia et al. [34] obtained an accuracy of 84% using the same model on the LIDC-IDRI dataset.Halder et al. [35] presented an end-to-end system for detecting and classifying lung nodules from high-resolution CT images using atrous convolution.They achieved high-performance indices, with the proposed architecture, ATCNN2PR, consisting of a two-layer atrous pyramid and residual connections, demonstrating the highest classification accuracy.The results showed that the system outperformed other competing frameworks with an accuracy, specificity, and sensitivity of 95.97%, 96.89%, and, 95.84%, respectively.

Survey of ensemble techniques
Ensemble learning is a technique that combines many learning models to increase the overall accuracy and resilience of the learning framework.In this section, we will give a quick overview of various recently proposed ensemble learning approaches with an emphasis on medical picture analysis.Maji et al. [36] introduced a computational imaging system for the detection of blood vessels in fundus color images using ensemble learning and deep learning.Their approach involved training an ensemble of deep CNNs to segment vessel and non-vessel regions of a color fundus image.During inference, the responses of individual ConvNets within the ensemble were combined by averaging to generate the final segmentation.The method was evaluated using the DRIVE database, and the results showed a maximum average accuracy of 94.7%.Kassan et al. [37] presented an ensemble deep learning-based approach for binary classification of breast histopathology images.They utilized three pre-trained CNNs for feature extraction and a multi-layer perceptron classifier for the final classification.The proposed technique outperformed individual classifiers and other machine learning algorithms in terms of prediction accuracy on four benchmark datasets.Bhowal et al. [38] developed a CAD system for breast cancer detection using a Choquet Integral fusion technique that considers subsets of classifiers.For the categorization of breast cancer histology images, their proposed method included InceptionV3, Xception, VGG19, VGG16, and InceptionResNetV2.To address the complexity of calculating fuzzy measurements, the authors used the Coalition Game, Information Theory, and constructed a novel mathematical function.The ICIAR 2018 Grand Challenge on Breast Cancer Histology (BACH) photos were utilized for evaluation, which included 2-class and 4-class tasks.The fusion method outperformed all individual models with a test accuracy of 96% for the two-class problem.Similarly, the fusion approach achieved a test accuracy of 95% for the four-class problem.Table 1 shows some recent applications of ensemble learning in the field of medical imaging.
In our proposed ensemble approach, we use the Iraq-Oncology Teaching Hospital/National Center for Cancer Diseases (IQ-OTH/NCCD) [54] lung cancer dataset taken over a period of three months (in the fall of 2019 and published in 2020).The dataset contains CT scan images of healthy and lung cancer patients who were diagnosed through different stages of lung cancer.This dataset is publicly available [55][56][57] and contains three classes of images.Among the three classes, the benign class contains 120 images, the malignant class contains 561 images, and the other 461 images from normal patients as shown in Table 2.We took 70% of the data to train the models, 20% data for validation, and the remaining 10% for testing.

Proposed methodology
The suggested ensemble-based lung cancer classification model (MENet) is described in-depth in this section.In order to determine the correct class of the test data, we first provide a brief description of every base learner that produces scores of confidence for an incoming lung image.These scores are then further fused via the suggested ensemble methodology.

Deep neural based classifiers
CNNs are preferred over other types of machine learning algorithms for image classification tasks, including the detection of lung cancer from medical imaging datasets.One of the main reasons for this is that CNNs are specifically designed to handle spatial data such as images.They can learn features from the input data by using convolutional layers and pooling layers combination, which can effectively capture local patterns and spatial relationships between pixels.This makes them well-suited for analyzing medical images, which often contain complex structures and patterns.Additionally, CNNs can automatically learn and adapt to the features of the input data, without the need for manual feature engineering.Overall, these characteristics make CNNs a powerful tool for accurately classifying medical images, including those of lung cancer and have the potential to increase diagnosis precision and speed.Hereafter an exhaustive set of experiments, we have successfully settled down with Xception, Incep-tionResNetV2, and MobileNetV2 models.The choice of these architectures likely balances a trade-off between accuracy and computational efficiency.
Xception.Xception is a convolutional neural network architecture proposed by [58].It is an extension of the Inception architecture and is named "Extreme Inception" because it uses depthwise separable convolutions instead of standard convolutions used in Inception.The depth-wise separable convolution layer blocks make up the bulk of the Xception architecture, which is then followed by batch normalization and ReLU activation.The depthwise separable convolution layers are made up of two distinct layers: a pointwise convolution layer that applies a 1x1 convolutional filter to combine the output channels of the depthwise convolution and a depthwise convolution layer that applies a single convolutional filter to each input

Application Domain Reference
Glioma detection [39] COVID-19 detection [40][41][42][43] Breast cancer [44][45][46][47][48] Colorectal polyp classification [49] Brain cancer detection [50] Lymphoblastic leukemia cell image analysis [51] Alzheimer's disease classification [52] Monkeypox detection [53] https://doi.org/10.1371/journal.pone.0298527.t001 channel.A global average pooling layer and a fully connected layer for classification, constitute the final layer of the network.The depthwise separable convolutions are more computationally efficient than conventional convolutions and aid in lowering overfitting, and the residual connections enable deeper network topologies while addressing the vanishing gradient issue.This makes Xception a powerful and effective model for image classification tasks.The Xception architecture is illustrated in Fig 3 .InceptionResNetV2.Introduced by [59], InceptionResNetV2 combines the strengths of Inception and ResNet architectures.It utilizes residual connections with Inception modules, which not only address the vanishing gradient problem but also enable more complex network designs.Additionally, the model requires fewer parameters to achieve high accuracy and can quickly learn and extract features from input data.Overall, the InceptionResNetV2 architecture is a powerful tool for image classification tasks, offering both high accuracy and efficient computation.The InceptionResNetV2 architecture is shown in Fig 4 .MobileNetV2.MobileNetV2, proposed by [60] is a convolutional neural network designed for mobile and embedded vision applications.It uses depthwise separable convolution, inverted residuals, and linear bottlenecks to improve efficiency while preserving representational power.It includes two types of blocks: residual blocks with a stride of 1 and blocks with a stride of 2 for downsizing.Both types of blocks consist of three layers.The first layer in each block is a 1×1 convolution with ReLU6 activation.The second layer is a depthwise convolution, which applies a separate convolutional filter to each input channel.The final layer in each block is another 1×1 convolution but without any non-linearity.A width multiplier is used in this network to optimize the network for different hardware and resource constraints.MobileNetV2 is a highly efficient and lightweight model, making it suitable for deployment on

Proposed fuzzy-ensemble method
The main goal of our proposed approach is to provide greater adaptability and freedom in handling datasets with varying degrees of complexity.Using a fuzzy ranking-based approach, we can consider the uncertainty of each classifier's predictions and assign different levels of importance to each classifier based on its performance on a particular test case.In the proposed methodology, the three CNN-based classifiers (Xception, InceptionResNetV2, and MobileNetV2) are used to detect lung cancer cases from CT scans, and fuzzy ranks are generated for each of them using the re-parameterized Mitscherlich function.To increase the overall accuracy of the classification, these fuzzy ranks are then combined using an ensemble method.The Mitscherlich function, which has been applied to the field of machine learning by [61,62], is based on the idea of crop yield response [63] to various levels of fertilizer application [64].
Here, it is used to combine the outputs of different models with various strengths and weaknesses.

Significance of Mitscherlich function
The Mitscherlich function plays a pivotal role in our rank-based fuzzy ensemble logic.Its selection was made based on a multitude of factors, which we shall delve into, in this section.The Mitscherlich function serves as a crucial tool for illustrating the performance of each model within our ensemble as a function of input strength.The function has a similarity to linear mapping within the (0,1) range.To address this concern, we will provide a more comprehensive explanation of our rationale and support it with relevant discussions and experiments.In essence, the Mitscherlich function allows us to create a dose-response curve that portrays each model's performance concerning the input strength.This curve represents the relationship between the weighted sum of the models' predictions and the actual outcomes, yielding a value that quantifies the fit between predictions and actual results.Here's where its unique value becomes apparent: we employ this function to generate fuzzy ranks for each class from the ensemble's individual models.The process involves ranking the confidence scores of each class provided by the base classifiers.These fuzzy ranks are then employed in ensemble predictions on a validation set, specifically within the top K ranks.Why is this important?Well, it enables us to construct an adaptable ensemble model capable of dynamically adjusting the ranking of individual models based on the specific characteristics of each input instance.As the decision score for a class, accurately predicted by a classifier, approaches one, the Mitscherlich function exhibits a steeply dropping nature within the domain range of 0 to 1.This characteristic proves highly advantageous when forming an ensemble of decision scores from various learning models.It ensures that as we approach a confident prediction (score near one), the Mitscherlich function's response is highly sensitive, effectively distinguishing between models with slight performance variations in this critical region.

Implementation of the ensemble model
There should be M different decision scores (also known as confidence scores of classifiers) such as CoF (1) , CoF (2) , . .., CoF (M) for each input image P. As we utilized three different CNNbased models to produce the scores of confidence on the dataset, M, in our case, equals 3.In the Eq 1, the decision scores from the dataset are normalized, where C is the total count of classes in the dataset under consideration.
Fuzzy ranks are generated using the scores of confidence of each sample in the dataset, which are divided into three different classes.The Mitscherlich function produces the fuzzy rank for a class c using the i th classifier's scores of confidence as shown Eq 2.
The Fig 6 shows a pictorial depiction of the modified Mitscherlich function graph, as mentioned in Eq 2. The value of R ðiÞ c is in-between 0 and 1, with the lowest value 0 being equivalent to rank 1 (best rank), i.e., a greater score of confidence results in a lower (better) value of rank.The fuzzy rank sum (FRS c ) and complement of the confidence factor sum (CCFS c ) are computed as follows if K (i) is used to represent only the top k ranks, that is, rank 1, 2,. .., k, that corresponds to class c are as follows:

FDS c ¼ FRS c * CCFS c ð5Þ
Finding the class with the lowest FDS value yields the final projected class for data instance I, which is provided as shown in Eq 6.

FDS c ð6Þ
A dry run of the proposed fuzzy rank-based ensemble method is shown under S1 Appendix.

Results and discussion
In this section, we report the detailed results and the corresponding analysis of the proposed ensemble of CNN models used for lung cancer detection in CT scans.The distribution of images in the dataset is already provided in Table 2.The implications of the obtained results are also discussed.In addition, we present a comparative evaluation to show that the proposed method outperforms other models and commonly used ensemble techniques implemented in the literature.

System configuration
The entire set of experiments has been run in Jupyter equipped with a 12 GB NVIDIA Tesla T4 GPU that has been made available by Google Colab.Python 3 environment, along with open-source modules such as Tensorflow, Keras, Matplotlib, Scikit, Numpy, and Pandas is utilized for the implementation.

Evaluation metrics
Evaluation metrics are important in assessing the effectiveness and strength of a predictive/ learning model.It is important to use a variety of standard evaluation metrics to get a complete picture of the model's performance and to ensure that it meets the requirements of the problem under consideration.In classification problems, these metrics are used to evaluate the performance in predicting the correct class label for a given input.Table 3 shows various performance measures used in our classification method.Consider there is a two-class classification problem, where one class is termed as 'positive' and another one is termed as 'negative'.Most of these measures are computed using a confusion matrix employing four fundamental components, such as true positive (T P ) rate, true negative (T N ) rate, false positive(F P ) rate, and false negative(F N ) rate.

Implementation
Initially, we perform extensive experimentation with different combinations of CNN models to determine the best combination of base learners for our proposed ensemble technique.The hyperparameters selected for this experiment are mentioned in Table 4.The results of the experimentation are shown in Table 5.
The best accuracy is obtained by selecting Xception, InceptionResnetV2, and MobileNetV2 as the base classifers for the ensemble approach.The metric scores given by these models along with some additional information is shown in Table 6.These three models, i.e., Xception, InceptionResNetV2, and MobileNetV2 give us an accuracy score of 99.02%, 97.26%, and 99.45%, respectively.The results show that these models are very reliable for being chosen for this fuzzy ensemble.Each of these base models generates the confidence scores for each class for every single image in the dataset.These confidence scores of all the base models for each image are generated and stored for each individual classifier.For the results obtained in Table 6, the three transfer-learned models have been trained for 60 epochs with the Adam optimizer individually.Fig 7 shows their respective confusion matrices on the dataset used.Although there are some misclassifications, the ratio of misclassifications to that of the correct classifications is very low, indicating that these models are reliable for the work.The loss curves obtained by each of the base learners(or classifiers) are shown in Fig 8 .As our models are all transfer learning-based models that have been pre-trained on the ImageNet dataset, we just had to fine-tune the models on our IQ-OTHNCCD lung cancer dataset.From the loss curves of the three models, we can see that there is hardly any problem with overfitting in the three models.The accuracy curves as shown in Fig 9 display the corresponding accuracy scores as shown in Table 6.After fusing the confidence scores of the three models using the proposed Mitscherlich function-based ensemble model, we get the results.
From Figs 10 and 11 and Table 7, we can observe that the overall accuracy achieved is 99.54% after combining the results of the three base classifiers.The class-wise results have also been given in the Table 7.All these high classification accuracies indicate that the model is

Comparison with state-of-the-art methods
In our proposed method, we have evaluated our model using the Mitscherlich function-based fuzzy ranking-based ensemble approach.Fuzzy ranks are generated for each of the base classifiers, and then these ranks are combined using the ensemble method.The proposed method is compared with the methods found in the literature, and it has been observed that the proposed method achieves the highest accuracy score among others.In comparison, the dataset we have used and compared with others is the same dataset (i.e., the IQ-OTHNCCD lung cancer dataset).Table 8 provides an illustration of the comparative study of our suggested model with others.
Our proposed ensemble approach achieves the best performance accuracy of 99.54% as compared to the other methods or individual classifiers.Out of these, the methods proposed by [57,65,66] are based on the deep CNN methods and achieve accuracy of 95.71%, 97.00%, and 99.45%, respectively.[67] have attained 98.83% accuracy by using three transfer learning

Comparison with other ensemble techniques
In this section, experimental results are compiled in Table 9 to demonstrate the proposed ensemble scheme's superiority to well-known traditional ensemble techniques.The ensembles used the identical three base CNN learners, Xception, InceptionResNetV2, and MobileNetV2.The suggested ensemble method performed better than several popularly used ensemble  schemes.It is clear from the results that the performance of the weighted average ensemble, which only takes the accuracy metric into account when determining the weights, came the closest to matching the performance of the proposed ensemble technique.In the majority voting-based ensemble, the class that received the highest votes from the base learners is predicted to be the class of the sample.

Data visualization
In this section, we take help from two data visualization tools to visually show some results of the proposed method, namely, GradCAM and t-SNE plots.GradCAM analysis.In this study, we have utilized GradCAM, a technique that creates a gradient-weighted class activation map as proposed by [70], to produce visual explanations of the model predictions.These visualizations help to demonstrate how neural networks arrive at decisions.As shown in Figs 13-15, we have used GradCAM to generate visualizations for the Malignant, Benign, and Normal lung images, respectively from the IQ-OTHNCCD dataset, using the three base models that have been used to construct the ensemble model.It is evident that the various models concentrate on distinct areas of the lung CT scans, thereby suggesting that individual learners capture unique and complementary information that is required to  dimensional Euclidean distances among the data points into conditional probability scores that indicate similarities.This is achieved using SNE (Stochastic Neighbor Embedding) on the data points.The conditional probability P j|i , which denotes the similarity of data point x j to data point x i , is defined using the Eq 7.
We observe that the images corresponding to the different stages of lung cancer are clearly separated into different clusters of points.The t-SNE plot visualizations for the Benign, Malignant, and Normal classes of lung nodule CT images from the IQ-OTHNCCD dataset are presented in the first three images of Fig 16 .These visualizations have been generated using the three selected base models that form the ensemble.The t-SNE plot of the ensemble model is presented in the fourth image of Fig 16.

Empirical significance of the ensemble model
In this section, we report few outlier cases identified during the model testing and analysis.There are cases where the proposed model has performed well, although one or two base classifiers have wrongly classified the sample.Figs 17 and 18 highlight the cases when for a benign image and a malignant image, respectively, two of our base models have given correct answers, and one has not, but our proposed ensemble model is still able to give the correct result.Fig 19 highlights the condition where for a normal image, only one of our base models is able to give the correct result, while the other two could not.In this case, our proposed ensemble method can also give the correct results.These empirical results ensure the effectiveness of the proposed ensemble model in varied scenarios.

Statistical analysis
We performed McNemar's statistical test, a non-parametric test, to determine the statistical significance of the results obtained by the proposed ensemble method.The ensemble result is compared against three base models' results, namely Xception, InceptionResNetV2, and Mobi-leNetV2, on the IQ-OTHNCCD dataset used in this study.The probability scores are used to design the ensemble model.In this case, the null hypothesis is that there is no significant difference in the performance of the three base CNN models in predicting lung cancer outcomes.The results of McNemar's test are presented in Table 10, and the p-value for the dataset used in this study is found to be less than 0.05 (5%), indicating statistical evidence against the null

Additional experiments
For further generalization of results, we have executed our proposed model on the LIDC-IDRI [72] dataset, which is also a popular publicly available dataset.The dataset has already been pre-split into train, test and validation sets, each containing images for two classes Benign and Malignant, both of which further contain the CT images of the respective category.Table 11 provides a detailed description of the distribution of the dataset.Each of these base models generates the confidence scores for each class for every single image in the dataset.These confidence scores of all the base models for each image are generated and stored for each individual base models.For the results obtained in Table 12, the three transfer-learned models have been trained for 60 epochs with the Adam optimizer individually.
From Table 13, we can observe that the overall accuracy achieved is 95.75% after combining the results of the three base classifiers.The class-wise results have also been given in the Table 13.All these classification scores indicate that the model is highly reliable.

Comparison with past methods
The proposed method is compared with the methods found in the literature review, and it has been observed that the proposed method achieves the highest accuracy score among others.In comparison, the dataset we have used and compared with others is the same dataset (i.e., the LIDC-IDRI lung cancer dataset).Table 14 provides an illustration of the comparative study of our suggested model with others.

Conclusion and future scope
In recent times, it has been observed that CAD systems can help diagnostic precision and decrease the likelihood of human errors by automating the diagnosis process.In the present work, we have designed an ensemble model, called MENet, using three transfer learning-based CNN models, namely Xception, InceptionResNetV2, and MobileNetV2, to enhance the accuracy of a lung cancer prediction model.In doing so, we have used a fuzzy ranking-based approach, which considers the uncertainty of each classifier's predictions and assigns different levels of importance to each classifier.The fuzzy ranking system is designed based on the Mitscherlich function, which combines the outputs of the base classifiers to form a final prediction model that is more accurate than each classifier's individual prediction ability.The proposed method has been evaluated on the two lung CT scan datasets, namely IQ-OTHNCCD and LIDC-IDRI, and the obtained results are better than many recently proposed methods.However, there are some false positives and false negatives, and these are significant challenges in the medical field as it would directly affect the treatment of patients.Hence, in the future, we need to reduce such errors.For this purpose, we may apply some attention mechanisms to the base CNN models that might help to generate better feature maps by focusing on the important regions, which, in turn, might produce a better prediction model.In the future, we would like to explore some lightweight CNN models to make the overall model to be useful in practical cases.Moreover, we recognize the importance of thoroughly analyzing and mitigating potential problems associated with our proposed method.Future work will involve a comprehensive examination of challenges, ranging from interpretability issues to scalability concerns.Developing strategies to address these challenges will contribute to the robustness and dependability of the MENet across diverse medical settings.Considering the evolving nature of medical datasets, the model's adaptability to new information and potential concept drift can also be assessed as future work.
Fig 1.The overall pipeline of the proposed model is represented in the diagram presented in Fig 2. The main contributions of this work are as follows:

Fig 5 .
Fig 5. Architecture of the MobileNetV2 model.https://doi.org/10.1371/journal.pone.0298527.g005 If class c does not fall among the top k class ranks, penalties P R c and P CoF c are applied to it.Using the aforementioned Mitscherlich function, the value of P R c is set to 1, which is determined by setting CoF ðiÞ c ¼ 0; and the value of P CoF c is set to 0. These penalty values prevent class c from emerging as an improbable winner.The combination of FRS c and CCFS c , which is used to produce the ensemble model's final predictions, yields the decision score in question.Eq 5 determines the final decision score (FDS).

Fig 12 .
Fig 12. Results of the Models depicting their overall accuracies, precisions and recalls on the IQ-OTHNCCD dataset: (a) Overall Accuracies of the models, (b) Overall Precisions of the models, (c) Overall Recalls of the models.https://doi.org/10.1371/journal.pone.0298527.g012

Fig 16 .
Fig 16.Graphical representation of the t-SNE plots of samples of the IQ-OTHNCCD dataset with the three base models and the final ensemble model.https://doi.org/10.1371/journal.pone.0298527.g016

Fig 17 .Fig 18 .
Fig 17.Demonstration for the case when two base models give the correct results, but the third one gives an erroneous result for a Benign image.https://doi.org/10.1371/journal.pone.0298527.g017

Fig 19 .
Fig 19.Demonstration for the case when one base model gives the correct results, but the remaining two give erroneous results for a Normal image.https://doi.org/10.1371/journal.pone.0298527.g019 We use class weights to address the class imbalance in the dataset.To prevent overfitting, we use early stopping while training the model.Fig 20 depicts the loss and accuracy curves of the individual base models along with their confusion matrices and ROC-AUC curves on the test dataset.We can observe from Fig 20 that the loss still show a slight tendency of overfitting but overall it shows that our models are able to learn properly with decreasing loss values.The accuracy curves show the high scores obtained by our model.The confusion

Fig 22 .
Fig 22. Results of the models depicting their overall accuracy, precision and recall on the LIDC-IDRI dataset: (a) Overall accuracy of the models, (b) Overall of the models, (c) Overall Recall of the models.https://doi.org/10.1371/journal.pone.0298527.g022

Table 5 . Results of experiments implemented to determine the base classifiers for forming the ensemble in this study.
https://doi.org/10.1371/journal.pone.0298527.t005

Table 9 . Performance comparison of the proposed ensemble method with some commonly used ensemble methods evaluated on the IQ-OTHNCCD dataset.
Scores are in %.